Enrichment of Bilingual Dictionary through News Stream Data
نویسندگان
چکیده
Bilingual dictionaries are the key component of the cross-lingual similarity estimation methods. Usually such dictionary generation is accomplished by manual or automatic means. Automatic generation approaches include to exploit parallel or comparable data to derive dictionary entries. Such approaches require large amount of bilingual data in order to produce good quality dictionary. Many time the language pair does not have large bilingual comparable corpora and in such cases the best automatic dictionary is upper bounded by the quality and coverage of such corpora. In this work we propose a method which exploits continuous quasi-comparable corpora to derive term level associations for enrichment of such limited dictionary. Though we propose our experiments for English and Hindi, our approach can be easily extendable to other languages. We evaluated dictionary by manually computing the precision. In experiments we show our approach is able to derive interesting term level associations across languages.
منابع مشابه
EFL Translation Students' Perspective toward Using Bilingual Dictionary in Translation of Polysemous Words
This research presented the use of bilingual dictionary and addressed the EFL translation students' points of view on the use of bilingual dictionary in translating polysemous words (English to Persian). Moreo- ver, it aimed at finding the possible relationship between the effect of using bilingual dictionary by stu- dents in translating polysemous words and their achieved scores. In the study ...
متن کاملMonolingual and bilingual dictionary approaches to the enrichment of the Spanish WordNet with adjectives
We report on two different approaches to the incorporation of adjectives in Spanish WordNet based on automatic extraction techniques using EuroWordNet and machine-readable dictionaries. We show that a monolingual dictionary approach enables to exploit relations between different parts of speech and enrich the internal structure of the Spanish WordNet, while the methods based on bilingual dictio...
متن کاملAn Investigation into Bilingual Dictionary Use: Do the Frequency of Use and Type of Dictionary Make a Difference in L2 Writing Performance?
Bilingual dictionary use in L2 writing test performance has recently been the subject of debate. Opinions differ according to how the trait is understood and whether the system favors the process-oriented or product-oriented views towards the assessment and writing skill. Given the need for more empirical support, this study is aimed at investigating the availability of bilingual dictionary use...
متن کاملThe Effect of Bilingual Term List Size on Dictionary-Based Cross-Language Information Retrieval
Bilingual term lists are extensively used as a resource for dictionary-based Cross-Language Information Retrieval (CLIR), in which the goal is to find documents written in one natural language based on queries that are expressed in another. This paper identifies eight types of terms that affect retrieval effectiveness in CLIR applications through their coverage by general-purpose bilingual term...
متن کاملSemi-automatic Compilation of Bilingual Lexicon Entries from Cross-Lingually Relevant News Articles on WWW News Sites
For the purpose of overcoming resource scarcity bottleneck in corpus-based translation knowledge acquisition research, this paper takes an approach of semi-automatically acquiring domain specific translation knowledge from the collection of bilingual news articles on WWW news sites. This paper presents results of applying standard co-occurrence frequency based techniques of estimating bilingual...
متن کامل